”Automatic Extension of Semantic Lexicons with a Bootstrapping Algorithm”
نویسندگان
چکیده
This work investigates and extends a bootstrapping approach which permits to extend high quality lexical resources with the help of large corpora. The emphasis lies on the extraction of lexical-semantic information and word meaning, which are fundamental components for advanced applications such as information retrieval, summarizing textual information or semantic web. The approach is based on co-occurrences of verbs with nouns in a specific context such as object, subject or certain theta-roles. The experiments use a large parsed corpus and are compared to past investigations with adjectives and nouns in order to find out whether adjective modifiers or certain verb noun relations are more suitable for classifying nouns with respects to their semantic characteristics. The algorithm starts with several seed words whose characteristics are known and which stand in certain relations to the respective verb or adjective in the sentence. Other unknown nouns that co-occur in the same context then inherit some of the characteristics of these seed words. The aim is to find the most effective relations to enhance semantic resources for nouns in general and apply the findings to the German lexicon HaGenLex automatically and weakly supervised. The findings of this work will help to extend already existing lexicons. This is necessary since there are still no sufficiently large semantic lexicons for German. The first chapter outlines computer linguistics and corpus linguistics and explains the semantic structure that is used in the lexicon. Furthermore basics of bootstrapping and other similar approaches are provided to better understand the scientific context of this work and to show the general applicability of such an approach. In chapter 2 the pre-processing steps are presented and the algorithm is explained by theory and examples. Different parameters that can result in major changes of the results are shown. Chapter 3 describes the experiments in great detail. While experiments with adjectives have already been done and are compared to new experiments, the extension to verbal relations such as subject, object and theta-roles has hitherto not been examined for German. By means of extensive experiments, effective relations for bootstrapping are discovered and optimal new parameter combinations are found. The chapter ends with the combination of the three main relations, which outperforms separately obtained solutions and increases precision significantly. In the last chapter an outlook with suggestions for further improvements and extensions is given and an absolutely novel approach which combines genetic algorithms and bootstrapping is outlined.
منابع مشابه
”Automatic Extension of Feature-based Semantic Lexicons via Bootstrapping
This work investigates and improves a bootstrapping approach which permits to extend high quality lexical resources with the help of large corpora. The emphasis lies on the extraction of lexical-semantic information and word meaning, which are fundamental components for advanced applications such as semantic parsing, information retrieval or summarizing textual information. ... In the last chap...
متن کاملSemantic Bootstrapping with a Cluster-Based Extension to DIPRE
The practical applications of information extraction are currently limited by the need to hand-construct search patterns and lexicons and / or to have available large labelled training sets. To address this issue, we present a semantic bootstrapping technique based on Brin’s DIPRE algorithm. The basic algorithm is extended by using clustering to group similar occurrences when extracting new pat...
متن کاملAutomatic Extraction of Polar Adjectives for the Creation of Polarity Lexicons
Automatic creation of polarity lexicons is a crucial issue to be solved in order to reduce time and efforts in the first steps of Sentiment Analysis. In this paper we present a methodology based on linguistic cues that allows us to automatically discover, extract and label subjective adjectives that should be collected in a domain-based polarity lexicon. For this purpose, we designed a bootstra...
متن کاملA Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts
This paper describes a bootstrapping algorithm called Basilisk that learns highquality semantic lexicons for multiple categories. Basilisk begins with an unannotated corpus and seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk hypothesizes the semantic class of a word based on collective information over a large body of extraction ...
متن کاملRelation Guided Bootstrapping of Semantic Lexicons
State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. Thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006